Skip to content

[5525939] Allow user to select target opset in MOQ#809

Merged
galagam merged 6 commits intoNVIDIA:mainfrom
galagam:dev-gagam-opset-arg-in-moq
Jan 27, 2026
Merged

[5525939] Allow user to select target opset in MOQ#809
galagam merged 6 commits intoNVIDIA:mainfrom
galagam:dev-gagam-opset-arg-in-moq

Conversation

@galagam
Copy link
Copy Markdown
Contributor

@galagam galagam commented Jan 22, 2026

What does this PR do?

Type of change: new feature

Overview:

  • Allow user to select the target opset
  • Minimum opset will be defined according to quantization mode
  • Add tests in tests/unit/onnx/test_quantize_api.py

Testing

Added unit tests
tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int8] PASSED [ 11%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[fp8] PASSED [ 22%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int4] PASSED [ 33%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int8] PASSED [ 44%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[fp8] PASSED [ 55%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int4] PASSED [ 66%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int8] PASSED [ 77%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[fp8] PASSED [ 88%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int4] PASSED [100%]

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: Yes - auto update according to argparser help
  • Did you update Changelog?: Yes

Additional Information

Requested as a WAR for a Windows-onnxruntime issue in 5525939, but regardless, it's a useful feature to have

Summary by CodeRabbit

  • New Features

    • Added --opset CLI option enabling users to specify target ONNX opset version when quantizing models.
    • Automatic validation ensures the opset version is compatible with quantization requirements, with warnings when adjustments are made.
  • Tests

    • Added comprehensive test coverage for opset version handling across quantization workflows.

✏️ Tip: You can customize this high-level summary in your review settings.

@galagam galagam requested review from a team as code owners January 22, 2026 19:12
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 22, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

This change introduces opset version control to ONNX quantization, allowing users to specify target ONNX opset versions via CLI --opset flag. The system automatically determines minimum required opset based on quantization precision requirements and validates or upgrades user-specified values accordingly.

Changes

Cohort / File(s) Summary
CLI Interface
modelopt/onnx/quantization/__main__.py
Added --opset argument to parser with help text describing target opset behavior; wired argument into quantize() function call.
Configuration
setup.py
Upgraded optional onnxruntime dependency pins from v1.22.0 to v1.23.0 for both standard and GPU variants.
Documentation
CHANGELOG.rst
Added changelog entry for new --opset CLI option.
Opset Utilities
modelopt/onnx/utils.py
Introduced BASE_MIN_OPSET constant (value: 19), QDQ_PRECISION_MIN_OPSET mapping, get_qdq_precisions() function to detect Q/DQ precision types from model, and get_min_opset_for_precisions() to compute maximum required opset for a given set of precisions.
FP16/BF16 Conversion
modelopt/onnx/autocast/convert.py
Added opset parameter to convert_to_f16() signature; implemented dynamic min_opset computation based on precision requirements; replaced hardcoded min_opset (21) with computed value; added validation and auto-upgrade logic with warnings.
Quantization Core
modelopt/onnx/quantization/quantize.py
Added opset parameter to quantize() and _preprocess_onnx() functions; implemented opset validation, minimum requirement checking, and automatic upgrading logic; threads opset through preprocessing and quantization paths; converts model to target opset via onnx.version_converter when needed.
FP8 Quantization
modelopt/onnx/quantization/fp8.py
Added opset parameter to quantize() signature; propagates opset to convert_to_f16() call.
INT8 Quantization
modelopt/onnx/quantization/int8.py
Added opset parameter to quantize() signature; propagates opset to both quantize_static() and convert_to_f16() calls.
Test Coverage
tests/unit/onnx/test_quantize_api.py
Added three parameterized test functions (test_opset_below_minimum_upgrades_to_minimum, test_opset_below_original_uses_original, test_opset_above_minimum) validating opset handling across quantization modes (int8, fp8, int4); includes MIN_OPSET mapping defining per-mode minimums.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Quantizer as quantize()
    participant Preprocessor as _preprocess_onnx()
    participant OpsetUtil as Opset Utils
    participant Converter as convert_to_f16()
    participant Model as ONNX Model

    User->>Quantizer: quantize(model, opset=X)
    Quantizer->>Preprocessor: _preprocess_onnx(model, opset=X)
    
    Preprocessor->>OpsetUtil: get_qdq_precisions(model)
    OpsetUtil-->>Preprocessor: precision_set
    
    Preprocessor->>OpsetUtil: get_min_opset_for_precisions(precision_set)
    OpsetUtil-->>Preprocessor: mode_min_opset
    
    alt opset provided
        Preprocessor->>Preprocessor: validate opset vs mode_min_opset
        alt opset < mode_min_opset
            Preprocessor->>Preprocessor: warn & upgrade to mode_min_opset
        end
    else opset not provided
        Preprocessor->>Preprocessor: select max(original_opset, mode_min_opset)
    end
    
    Preprocessor->>Model: convert to target_opset
    Preprocessor-->>Quantizer: preprocessed_model
    
    Quantizer->>Converter: convert_to_f16(model, opset=target_opset)
    Converter->>OpsetUtil: get_qdq_precisions(model)
    Converter->>OpsetUtil: get_min_opset_for_precisions(precision_set)
    Converter->>Converter: compute min_opset with validation
    Converter-->>Quantizer: converted_model
    
    Quantizer-->>User: quantized_model
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title '[5525939] Allow user to select target opset in MOQ' clearly and specifically describes the main feature addition: enabling users to select the target ONNX opset in MOQ (Model Optimizer for Quantization).
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/onnx/quantization/quantize.py (1)

440-521: Pass the effective target opset downstream.

_preprocess_onnx may upgrade the model's opset via onnx.version_converter.convert_version(), but quantize_int8 and quantize_fp8 receive the raw user opset input. This creates a mismatch: the quantizers' opset parameter may diverge from the actual model opset after preprocessing. Pass get_opset_version(onnx_model) (already imported) instead.

🔧 Suggested fix
     ) = _preprocess_onnx(
         onnx_path,
         use_external_data_format,
         output_path,
@@
         simplify,
         quantize_mode,
         opset,
     )
+    target_opset = get_opset_version(onnx_model)
@@
-            opset=opset,
+            opset=target_opset,
             **kwargs,
         )
🤖 Fix all issues with AI agents
In `@modelopt/onnx/autocast/convert.py`:
- Around line 222-235: The docstring for the conversion routine is out of sync
with the implementation: low_precision_type uses a base_min_opset of 19 for
"fp16" (see base_min_opset and low_precision_type in convert.py) but the
docstring still claims 13; update the docstring to state that the default
minimum opset for fp16 is 19 (and bf16 is 22) and keep the note that Q/DQ nodes
may require increasing the opset (e.g., FP8/INT4/NVFP4) so the documentation
matches the logic in get_qdq_precisions, get_min_opset_for_precisions and the
base_min_opset assignment.

In `@modelopt/onnx/quantization/__main__.py`:
- Around line 289-297: Update the help text for the "--opset" argument added via
argparser.add_argument to include bf16 minimum opset info: mention that
--high_precision_dtype bf16 may require opset 22 (in addition to the existing
note about 19 for fp16, 21 for int4, 23 for nvfp4), and make the same change to
the other duplicate "--opset" help string found elsewhere in this module; ensure
the message clearly states that opset may be automatically increased if required
by operations.

In `@modelopt/onnx/quantization/quantize.py`:
- Around line 124-130: The min-opset lookup uses exact keys so variants like
"int4_awq" fall back to BASE_MIN_OPSET; normalize quantize_mode before querying
QDQ_PRECISION_MIN_OPSET: compute a normalized_mode (e.g., if "int4" in
quantize_mode -> "int4", if "nvfp4" in quantize_mode or "float4" variant ->
"float4_e2m1fn", etc.), then use QDQ_PRECISION_MIN_OPSET.get(normalized_mode,
BASE_MIN_OPSET) when setting mode_min_opset; update references in quantize.py
that use quantize_mode (including the substring checks and the get_opset_version
flow) to use the normalized value so variants resolve to the correct minimum
opset.

In `@modelopt/onnx/utils.py`:
- Around line 702-731: get_qdq_precisions currently only inspects
DequantizeLinear nodes with Constant inputs and misses QuantizeLinear nodes and
non-constant/Variable paths (activations), causing under-reporting of
precisions; update get_qdq_precisions to also iterate QuantizeLinear nodes and
extract precision from their output_dtype attribute where present, and for both
QuantizeLinear and DequantizeLinear handle Variable inputs by resolving the
tensor type via the graph/model value_info or node.output type (e.g., check
graph.value_info / model.graph.value_info / model.graph.input/output types for
the corresponding tensor and use its elem_type/name), while still keeping the
existing Constant-path logic (Constant.values.dtype.name) and preserving
detection of custom nodes like TRT_FP4DynamicQuantize.
🧹 Nitpick comments (1)
tests/unit/onnx/test_quantize_api.py (1)

29-34: Avoid hard‑coded min‑opset duplication in tests.

To reduce drift, consider deriving these values from the production constants.

♻️ Suggested refactor
-from modelopt.onnx.utils import get_opset_version
+from modelopt.onnx.utils import BASE_MIN_OPSET, QDQ_PRECISION_MIN_OPSET, get_opset_version
@@
 MIN_OPSET = {
-    "int8": 19,
-    "fp8": 19,
-    "int4": 21,
+    "int8": BASE_MIN_OPSET,
+    "fp8": BASE_MIN_OPSET,
+    "int4": QDQ_PRECISION_MIN_OPSET["int4"],
 }

Comment thread modelopt/onnx/autocast/convert.py
Comment thread modelopt/onnx/quantization/__main__.py
Comment thread modelopt/onnx/quantization/quantize.py
Comment thread modelopt/onnx/utils.py
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 22, 2026

Codecov Report

❌ Patch coverage is 89.83051% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.02%. Comparing base (1c7a928) to head (94e574b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/utils.py 86.95% 3 Missing ⚠️
modelopt/onnx/autocast/convert.py 84.61% 2 Missing ⚠️
modelopt/onnx/quantization/quantize.py 95.65% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #809      +/-   ##
==========================================
+ Coverage   73.33%   74.02%   +0.69%     
==========================================
  Files         192      192              
  Lines       19613    19664      +51     
==========================================
+ Hits        14383    14557     +174     
+ Misses       5230     5107     -123     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread setup.py Outdated
@galagam galagam force-pushed the dev-gagam-opset-arg-in-moq branch from 9882089 to 8e80cee Compare January 25, 2026 07:50
@galagam galagam changed the title [5525939] Allow user to select target opset in MOQ; upgrade onnxruntime [5525939] Allow user to select target opset in MOQ Jan 25, 2026
@galagam galagam force-pushed the dev-gagam-opset-arg-in-moq branch from a864ab4 to 8cdbe62 Compare January 25, 2026 09:17
Comment thread modelopt/onnx/autocast/convert.py Outdated
@gcunhase gcunhase self-requested a review January 26, 2026 17:34
Copy link
Copy Markdown
Contributor

@gcunhase gcunhase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@galagam galagam force-pushed the dev-gagam-opset-arg-in-moq branch from 6f04093 to 08f961f Compare January 26, 2026 21:06
@galagam galagam enabled auto-merge (squash) January 26, 2026 21:06
Comment thread tests/unit/onnx/test_quantize_api.py Outdated
@galagam galagam force-pushed the dev-gagam-opset-arg-in-moq branch from 08f961f to acc8939 Compare January 27, 2026 11:55
@galagam
Copy link
Copy Markdown
Contributor Author

galagam commented Jan 27, 2026

Tests failing - depends on #819 to fix CI

@galagam galagam force-pushed the dev-gagam-opset-arg-in-moq branch from acc8939 to 3beb294 Compare January 27, 2026 15:08
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Jan 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Allow user to select the target opset
- Minimum opset will be defined according to quantization mode
- Upgrade onnxruntime to 1.23.0 to support nvfp4
- Add tests in tests/unit/onnx/test_quantize_api.py

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
galagam and others added 2 commits January 27, 2026 07:10
Co-authored-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
@galagam galagam force-pushed the dev-gagam-opset-arg-in-moq branch from 3beb294 to 94e574b Compare January 27, 2026 15:10
@galagam galagam merged commit 2c73de0 into NVIDIA:main Jan 27, 2026
37 checks passed
danielkorzekwa pushed a commit that referenced this pull request Feb 17, 2026
## What does this PR do?

**Type of change:** new feature 

**Overview:** 
- Allow user to select the target opset
- Minimum opset will be defined according to quantization mode
- Add tests in tests/unit/onnx/test_quantize_api.py

## Testing
Added unit tests

tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int8]
PASSED [ 11%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[fp8]
PASSED [ 22%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int4]
PASSED [ 33%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int8]
PASSED [ 44%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[fp8]
PASSED [ 55%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int4]
PASSED [ 66%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int8]
PASSED [ 77%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[fp8]
PASSED [ 88%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int4]
PASSED [100%]


## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: Yes - auto
update according to argparser help
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes

## Additional Information
Requested as a WAR for a Windows-onnxruntime issue in 5525939, but
regardless, it's a useful feature to have

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added `--opset` CLI option enabling users to specify target ONNX opset
version when quantizing models.
* Automatic validation ensures the opset version is compatible with
quantization requirements, with warnings when adjustments are made.

* **Tests**
* Added comprehensive test coverage for opset version handling across
quantization workflows.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com>
danielkorzekwa pushed a commit that referenced this pull request Mar 4, 2026
## What does this PR do?

**Type of change:** new feature

**Overview:**
- Allow user to select the target opset
- Minimum opset will be defined according to quantization mode
- Add tests in tests/unit/onnx/test_quantize_api.py

## Testing
Added unit tests

tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int8]
PASSED [ 11%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[fp8]
PASSED [ 22%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int4]
PASSED [ 33%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int8]
PASSED [ 44%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[fp8]
PASSED [ 55%]

tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int4]
PASSED [ 66%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int8]
PASSED [ 77%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[fp8]
PASSED [ 88%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int4]
PASSED [100%]

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: Yes - auto
update according to argparser help
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes

## Additional Information
Requested as a WAR for a Windows-onnxruntime issue in 5525939, but
regardless, it's a useful feature to have

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added `--opset` CLI option enabling users to specify target ONNX opset
version when quantizing models.
* Automatic validation ensures the opset version is compatible with
quantization requirements, with warnings when adjustments are made.

* **Tests**
* Added comprehensive test coverage for opset version handling across
quantization workflows.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants